Data importation

Import the data sets extracted from the NaPPI_Data Preparation R Markdown.

## [1] "endpoint.txt"            "NaPPI_DataAnalysis.html"
## [3] "NaPPI_DataAnalysis.Rmd"  "plant_info.txt"         
## [5] "S_timeseries.txt"        "T_timeseries.txt"       
## [7] "testtemplate"            "testtemplatemod.gif"    
## [9] "timeseries.txt"

We must convert the columns to factor and date formats.

1. Endpoint dataframe

A. Exploration of data

List of variables

This part extracts the variables in the endpoint dataframe.

## [1] "DW_shoot_g" "FW_shoot_g"

The variables for NaPPI are “DW_shoot_g” and “FW_shoot_g”

Exploration tables using the janitor and skimr packages

## # A tibble: 2 × 10
##   variable       n   min   max median   iqr  mean    sd    se    ci
##   <fct>      <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 DW_shoot_g   125  17.7  117    33.9  9.51  35.9  12.3  1.10  2.19
## 2 FW_shoot_g   125  13.2  347.  177.  89.0  178.   67.1  6.00 11.9
##    Genotype  n
## 1  EPPN01_H  9
## 2  EPPN02_H  9
## 3  EPPN03_H 10
## 4  EPPN04_H  7
## 5  EPPN05_H 13
## 6  EPPN06_H 18
## 7  EPPN07_L  2
## 8  EPPN08_H  7
## 9  EPPN09_H 11
## 10 EPPN10_H  7
## 11 EPPN10_L  2
## 12 EPPN11_H 10
## 13 EPPN11_L  3
## 14 EPPN12_H 10
## 15 EPPN13_H  5
## 16 EPPN20_T  3
Data summary
Name endpoint[, unlist(variabl…
Number of rows 126
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DW_shoot_g 1 0.99 35.93 12.35 17.73 28.79 33.94 38.30 117.0 ▇▂▁▁▁
FW_shoot_g 1 0.99 177.54 67.14 13.23 130.80 177.25 219.75 346.9 ▂▆▇▅▂

Boxplots, density histograms and qqPlots

## Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).
## Removed 1 rows containing non-finite values (`stat_boxplot()`).

B. Normality hypothesis and outlier detection

Remove the outliers, replacing them with NULL values and normality verification of residuals.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## 117 121 
## 116 120
## [1] 47.30 49.56 40.96 27.85

##  31 105 
##  30 104
## [1] 198.50 325.94 208.54 106.21

Violin and sina plots after outlier detection

ATTENTION ICI CHANGER LES NOMS DES VARIABLES

## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).
## Warning: Removed 1 rows containing non-finite values (`stat_sina()`).
## Warning: Removed 1 rows containing non-finite values (`stat_ydensity()`).
## Warning: Removed 1 rows containing non-finite values (`stat_sina()`).

Exploration statistics for the variables after outlier detection

ATTENTION ICI CHANGER LES NOMS DES VARIABLES

Data summary
Name endpoint_clean[, unlist(v…
Number of rows 126
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DW_shoot_g 1 0.99 35.93 12.35 17.73 28.79 33.94 38.30 117.0 ▇▂▁▁▁
FW_shoot_g 1 0.99 177.54 67.14 13.23 130.80 177.25 219.75 346.9 ▂▆▇▅▂
## # A tibble: 16 × 4
##    Genotype  mean std.dev n_missing
##    <fct>    <dbl>   <dbl>     <int>
##  1 EPPN13_H  46.0   17.6          0
##  2 EPPN12_H  44.6   28.7          0
##  3 EPPN09_H  41.9   10.5          0
##  4 EPPN10_H  39.7   17.4          0
##  5 EPPN10_L  36.4    5.29         0
##  6 EPPN08_H  35.2    5.98         0
##  7 EPPN05_H  34.8    9.79         0
##  8 EPPN04_H  34.7    3.91         0
##  9 EPPN06_H  34.1    8.64         0
## 10 EPPN03_H  33.9   10.3          0
## 11 EPPN01_H  33.8    4.87         0
## 12 EPPN11_H  32.6    7.29         1
## 13 EPPN02_H  30.6    4.81         0
## 14 EPPN20_T  30.4    5.07         0
## 15 EPPN07_L  28.7    2.00         0
## 16 EPPN11_L  28.6    7.13         0
## # A tibble: 16 × 4
##    Genotype  mean std.dev n_missing
##    <fct>    <dbl>   <dbl>     <int>
##  1 EPPN13_H  222.    69.5         0
##  2 EPPN10_H  210.    69.0         0
##  3 EPPN08_H  204.   105.          0
##  4 EPPN09_H  194.    43.7         0
##  5 EPPN06_H  189.    85.2         0
##  6 EPPN12_H  182.    55.9         0
##  7 EPPN05_H  181.    71.8         0
##  8 EPPN04_H  181.    40.6         0
##  9 EPPN01_H  180.    45.3         0
## 10 EPPN07_L  165.    61.2         0
## 11 EPPN11_H  164.    63.1         1
## 12 EPPN02_H  145.    43.0         0
## 13 EPPN03_H  143.    79.2         0
## 14 EPPN11_L  133.    59.3         0
## 15 EPPN10_L  132.    40.5         0
## 16 EPPN20_T  132.    39.3         0

C. Statistical models for phenotypic traits

La variable explicative(X) sera le génotype, variable catégorielle. Les réponses(Y) sont les données phénotypiques (dans ce cas-ci la FW_shoot_g et la Measured_plant_height_cm)

##  [1] EPPN20_T EPPN06_H EPPN08_H EPPN10_L EPPN05_H EPPN11_H EPPN09_H EPPN04_H
##  [9] EPPN03_H EPPN12_H EPPN10_H EPPN01_H EPPN02_H EPPN11_L EPPN13_H EPPN07_L
## 16 Levels: EPPN01_H EPPN02_H EPPN03_H EPPN04_H EPPN05_H EPPN06_H ... EPPN20_T
## [1] "DW_shoot_g" "FW_shoot_g"

ATTENTION ICI CHANGER LES VARIABLES ### 1. First linear models Firstly, we model the Y = X + r + c + e Where - Y is the phenotypic trait; - X the genotype; - r the row effect (fixed or random); - c the column effect (fixed or random);

Models for DW_shoot_g and FW_shoot_g with fixed or random effects of Row and Column.

## 
## Call:
## lm(formula = DW_shoot_g ~ Genotype + Row + Column, data = endpoint_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.874  -6.271  -1.509   5.519  50.723 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       28.2396     7.7544   3.642 0.000467 ***
## GenotypeEPPN02_H  -2.9902     6.1086  -0.490 0.625752    
## GenotypeEPPN03_H  -1.3912     5.8232  -0.239 0.811766    
## GenotypeEPPN04_H   2.6873     6.4388   0.417 0.677484    
## GenotypeEPPN05_H   0.8991     5.5722   0.161 0.872207    
## GenotypeEPPN06_H   1.4508     5.2469   0.277 0.782832    
## GenotypeEPPN07_L  -8.1686    10.3948  -0.786 0.434176    
## GenotypeEPPN08_H   2.7766     6.5691   0.423 0.673614    
## GenotypeEPPN09_H   9.7024     5.8961   1.646 0.103592    
## GenotypeEPPN10_H   4.2165     6.4622   0.652 0.515869    
## GenotypeEPPN10_L  11.2486    10.7744   1.044 0.299475    
## GenotypeEPPN11_H  -4.4941     5.9263  -0.758 0.450374    
## GenotypeEPPN11_L  -8.0526     8.7896  -0.916 0.362208    
## GenotypeEPPN12_H  13.1270     5.8280   2.252 0.026904 *  
## GenotypeEPPN13_H  14.4846     7.5473   1.919 0.058357 .  
## GenotypeEPPN20_T   2.3807     8.9236   0.267 0.790288    
## Row2              -4.1285     4.3358  -0.952 0.343734    
## Row3              -0.6911     4.2858  -0.161 0.872283    
## Row4              -0.2334     4.0814  -0.057 0.954530    
## Row5               0.5045     4.2127   0.120 0.904959    
## Row6               1.4742     4.3074   0.342 0.733018    
## Row7              10.4890     4.2989   2.440 0.016793 *  
## Column2            9.9420     7.5028   1.325 0.188728    
## Column3           11.7866     7.3113   1.612 0.110688    
## Column4           -2.1079     7.0850  -0.298 0.766811    
## Column5           -1.6712     7.6078  -0.220 0.826663    
## Column6            7.8853     7.6197   1.035 0.303702    
## Column7           10.6463     7.6410   1.393 0.167201    
## Column8            6.2134     7.3557   0.845 0.400673    
## Column9           14.4216     7.3664   1.958 0.053579 .  
## Column10           4.6916     8.2359   0.570 0.570435    
## Column11          -4.1446     7.9392  -0.522 0.603019    
## Column12           3.0330     7.8764   0.385 0.701158    
## Column13          -1.8445     7.4816  -0.247 0.805865    
## Column14           5.9081     7.7465   0.763 0.447792    
## Column15           4.9084     7.3706   0.666 0.507268    
## Column16          -0.5763     7.8673  -0.073 0.941776    
## Column17           1.1860     7.4026   0.160 0.873095    
## Column18           4.3316     7.4252   0.583 0.561212    
## Column19           1.6794     7.3640   0.228 0.820154    
## Column20           1.2678     7.4573   0.170 0.865415    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.04 on 84 degrees of freedom
##   (1 observation effacée parce que manquante)
## Multiple R-squared:  0.3557, Adjusted R-squared:  0.04887 
## F-statistic: 1.159 on 40 and 84 DF,  p-value: 0.2815
## Analysis of Variance Table
## 
## Response: DW_shoot_g
##           Df  Sum Sq Mean Sq F value Pr(>F)
## Genotype  15  2632.5  175.50  1.2103 0.2805
## Row        6  1316.1  219.35  1.5128 0.1839
## Column    19  2775.3  146.07  1.0074 0.4617
## Residuals 84 12180.2  145.00
## boundary (singular) fit: see help('isSingular')
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: DW_shoot_g ~ Genotype + (1 | Row) + (1 | Column)
##    Data: endpoint_clean
## 
## REML criterion at convergence: 884.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4674 -0.5997 -0.1208  0.4211  5.7901 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. 
##  Column   (Intercept) 5.573e-12 2.361e-06
##  Row      (Intercept) 4.200e+00 2.049e+00
##  Residual             1.455e+02 1.206e+01
## Number of obs: 125, groups:  Column, 20; Row, 7
## 
## Fixed effects:
##                  Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)       33.8930     4.1012 100.9566   8.264 5.79e-13 ***
## GenotypeEPPN02_H  -3.0693     5.6930 103.4879  -0.539   0.5910    
## GenotypeEPPN03_H  -0.1894     5.5567 104.6532  -0.034   0.9729    
## GenotypeEPPN04_H   0.8041     6.0836 103.1277   0.132   0.8951    
## GenotypeEPPN05_H   0.7398     5.2378 103.6519   0.141   0.8880    
## GenotypeEPPN06_H   0.5284     4.9392 104.9669   0.107   0.9150    
## GenotypeEPPN07_L  -4.7999     9.4796 106.5084  -0.506   0.6137    
## GenotypeEPPN08_H   1.4151     6.0931 104.4440   0.232   0.8168    
## GenotypeEPPN09_H   7.9956     5.4401 105.2978   1.470   0.1446    
## GenotypeEPPN10_H   5.7290     6.1102 106.4489   0.938   0.3506    
## GenotypeEPPN10_L   3.1602     9.4969 107.5285   0.333   0.7400    
## GenotypeEPPN11_H  -1.5713     5.6993 104.4418  -0.276   0.7833    
## GenotypeEPPN11_L  -5.7453     8.0905 106.9824  -0.710   0.4792    
## GenotypeEPPN12_H  10.8092     5.5510 103.8276   1.947   0.0542 .  
## GenotypeEPPN13_H  12.4213     6.8267 108.9198   1.820   0.0716 .  
## GenotypeEPPN20_T  -3.2346     8.0643 104.8364  -0.401   0.6892    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## DW_shoot_g ~ Genotype + (1 | Row) + (1 | Column)
##              npar  logLik    AIC     LRT Df Pr(>Chisq)
## <none>         19 -442.30 922.61                      
## (1 | Row)      18 -442.52 921.04 0.43533  1     0.5094
## (1 | Column)   18 -442.30 920.61 0.00000  1     1.0000
## 
## Call:
## lm(formula = FW_shoot_g ~ Genotype + Row + Column, data = endpoint_clean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -164.647  -39.774    8.291   37.825  146.473 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      132.5600    44.0975   3.006  0.00349 **
## GenotypeEPPN02_H -33.0756    34.7381  -0.952  0.34376   
## GenotypeEPPN03_H -49.2577    33.1153  -1.487  0.14064   
## GenotypeEPPN04_H  -0.5520    36.6162  -0.015  0.98801   
## GenotypeEPPN05_H  -0.1711    31.6877  -0.005  0.99571   
## GenotypeEPPN06_H  12.3357    29.8378   0.413  0.68035   
## GenotypeEPPN07_L -20.8507    59.1128  -0.353  0.72518   
## GenotypeEPPN08_H  19.0290    37.3572   0.509  0.61182   
## GenotypeEPPN09_H  15.2931    33.5299   0.456  0.64949   
## GenotypeEPPN10_H  29.7689    36.7492   0.810  0.42020   
## GenotypeEPPN10_L  -6.3234    61.2717  -0.103  0.91805   
## GenotypeEPPN11_H -20.3117    33.7015  -0.603  0.54834   
## GenotypeEPPN11_L -84.8333    49.9846  -1.697  0.09336 . 
## GenotypeEPPN12_H  12.0368    33.1425   0.363  0.71738   
## GenotypeEPPN13_H  21.1223    42.9196   0.492  0.62391   
## GenotypeEPPN20_T   4.6765    50.7465   0.092  0.92679   
## Row2               7.3553    24.6566   0.298  0.76620   
## Row3              18.2423    24.3725   0.748  0.45626   
## Row4             -11.1830    23.2103  -0.482  0.63119   
## Row5             -18.7620    23.9567  -0.783  0.43573   
## Row6              -2.3768    24.4955  -0.097  0.92293   
## Row7              20.7756    24.4468   0.850  0.39784   
## Column2           82.4323    42.6665   1.932  0.05673 . 
## Column3           91.1612    41.5776   2.193  0.03111 * 
## Column4            3.6414    40.2907   0.090  0.92820   
## Column5           34.0494    43.2639   0.787  0.43349   
## Column6           52.6270    43.3313   1.215  0.22795   
## Column7           51.6726    43.4524   1.189  0.23772   
## Column8           82.8466    41.8303   1.981  0.05092 . 
## Column9           54.2173    41.8912   1.294  0.19913   
## Column10          67.3053    46.8355   1.437  0.15442   
## Column11           7.6135    45.1488   0.169  0.86649   
## Column12          53.3238    44.7915   1.190  0.23721   
## Column13          37.2885    42.5462   0.876  0.38330   
## Column14          18.9326    44.0526   0.430  0.66846   
## Column15          14.4763    41.9148   0.345  0.73068   
## Column16          26.9097    44.7394   0.601  0.54914   
## Column17           9.3940    42.0968   0.223  0.82396   
## Column18          69.7947    42.2255   1.653  0.10208   
## Column19          65.3619    41.8777   1.561  0.12234   
## Column20          64.5650    42.4083   1.522  0.13165   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 68.48 on 84 degrees of freedom
##   (1 observation effacée parce que manquante)
## Multiple R-squared:  0.2952, Adjusted R-squared:  -0.04039 
## F-statistic: 0.8796 on 40 and 84 DF,  p-value: 0.6679
## Analysis of Variance Table
## 
## Response: FW_shoot_g
##           Df Sum Sq Mean Sq F value Pr(>F)
## Genotype  15  67131  4475.4  0.9544 0.5095
## Row        6  16220  2703.4  0.5765 0.7480
## Column    19  81644  4297.1  0.9164 0.5649
## Residuals 84 393900  4689.3
## boundary (singular) fit: see help('isSingular')
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: FW_shoot_g ~ Genotype + (1 | Row) + (1 | Column)
##    Data: endpoint_clean
## 
## REML criterion at convergence: 1256.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.83300 -0.62180 -0.01319  0.63786  2.35488 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Column   (Intercept)    0      0.00   
##  Row      (Intercept)    0      0.00   
##  Residual             4512     67.17   
## Number of obs: 125, groups:  Column, 20; Row, 7
## 
## Fixed effects:
##                  Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)      179.8056    22.3895 109.0000   8.031 1.23e-12 ***
## GenotypeEPPN02_H -34.8700    31.6635 109.0000  -1.101    0.273    
## GenotypeEPPN03_H -36.4046    30.8618 109.0000  -1.180    0.241    
## GenotypeEPPN04_H   0.8844    33.8497 109.0000   0.026    0.979    
## GenotypeEPPN05_H   1.2437    29.1262 109.0000   0.043    0.966    
## GenotypeEPPN06_H   8.9206    27.4214 109.0000   0.325    0.746    
## GenotypeEPPN07_L -14.5856    52.5080 109.0000  -0.278    0.782    
## GenotypeEPPN08_H  23.7130    33.8497 109.0000   0.701    0.485    
## GenotypeEPPN09_H  14.0154    30.1900 109.0000   0.464    0.643    
## GenotypeEPPN10_H  30.1459    33.8497 109.0000   0.891    0.375    
## GenotypeEPPN10_L -48.0706    52.5080 109.0000  -0.915    0.362    
## GenotypeEPPN11_H -15.4622    31.6635 109.0000  -0.488    0.626    
## GenotypeEPPN11_L -46.8089    44.7790 109.0000  -1.045    0.298    
## GenotypeEPPN12_H   1.9964    30.8618 109.0000   0.065    0.949    
## GenotypeEPPN13_H  42.1304    37.4648 109.0000   1.125    0.263    
## GenotypeEPPN20_T -48.2056    44.7790 109.0000  -1.077    0.284    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## Type III Analysis of Variance Table with Satterthwaite's method
##          Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## Genotype  67131  4475.4    15   109   0.992 0.4688
## boundary (singular) fit: see help('isSingular')
## boundary (singular) fit: see help('isSingular')
## ANOVA-like table for random-effects: Single term deletions
## 
## Model:
## FW_shoot_g ~ Genotype + (1 | Row) + (1 | Column)
##              npar  logLik    AIC LRT Df Pr(>Chisq)
## <none>         19 -628.29 1294.6                  
## (1 | Row)      18 -628.29 1292.6   0  1          1
## (1 | Column)   18 -628.29 1292.6   0  1          1

2. Linear models with Plant_type

Model with X as Plant_type instead of Genotype, and row and column effects as random effects. Plant_type is defined as H for Hybrid, L for pure Line and T for Tester.

## boundary (singular) fit: see help('isSingular')
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: DW_shoot_g ~ Plant_type + (1 | Row) + (1 | Column)
##    Data: endpoint_clean
## 
## REML criterion at convergence: 967
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.5048 -0.5790 -0.1394  0.2229  6.4321 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Column   (Intercept)   0.00    0.000  
##  Row      (Intercept)   3.56    1.887  
##  Residual             149.49   12.227  
## Number of obs: 125, groups:  Column, 20; Row, 7
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  30.8187     7.1286 121.8536   4.323 3.16e-05 ***
## Plant_typeH   5.5773     7.1843 120.0304   0.776    0.439    
## Plant_typeL   0.2128     8.4775 120.0526   0.025    0.980    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Plnt_H
## Plant_typeH -0.982       
## Plant_typeL -0.832  0.826
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

3. Linear models with asreml library

## ASReml Version 4.2 24/05/2024 03:11:33
##           LogLik        Sigma2     DF     wall
##  1     -343.3150      133.2052    109   03:11:33
##  2     -342.5665      138.0757    109   03:11:33  (  1 restrained)
##  3     -342.1558      145.5915    109   03:11:33  (  1 restrained)
##  4     -342.1411      145.5259    109   03:11:33  (  1 restrained)
##  5     -342.1406      145.5289    109   03:11:33  (  1 restrained)
##  6     -342.1405      145.5307    109   03:11:33  (  1 restrained)

##            component std.error   z.ratio bound %ch
## Row     4.200342e+00  7.844122 0.5354764     P   0
## Column  6.708305e-06        NA        NA     B  NA
## units!R 1.455307e+02 22.309571 6.5232389     P   0

PROBLEME DANS CE BLOC

4. Linear models with Soil variable

Model with Soil as explicative variable.

PROBLEME DANS CE BLOC

ANALYSE DES DONNEES MULTIVARIEES

PROBLEME DANS CE BLOC PCA, clustering, etc, voir p.56 biométrie 1

2. Exploration of the timeseries data

In this part, we look at the timeseries, S_timeseries and T_timeseries datasets.

Number of data observations per day for the traits of the timeseries datasets

REPLACER DANS LE CODE APRES

h1 <- ggplot(timeseries, aes(x = Date)) + geom_bar(aes(fill = Genotype), position = “stack”, width = 1) + scale_fill_viridis_d(option = “D”) + labs(x = “Date”, y = “Number of observations”, title = “Observations per day for timeseries_shoot_and_plant”) + scale_y_continuous(breaks = seq(from = 0, to = 325, by = 25)) + scale_x_date(date_breaks = “2 days”, date_labels = “%d-%m-%Y”) + # Exemple de format de date (%d-%m-%Y) theme(axis.text.x = element_text(angle = 45, hjust = 1), # Rotation des étiquettes des dates panel.grid.major.x = element_line(color = “lightgray”, size = 0.5), # Paramètres de la grille panel.grid.minor.x = element_blank()) # Supprimer les lignes de grille mineures

h3 <- ggplot(T_timeseries, aes(x = Date)) + geom_bar(aes(fill = Genotype), position = “stack”, width = 1) + scale_fill_viridis_d(option = “D”) + labs(x = “Date”, y = “Number of observations”, title = “Observations per day for T_timeseries_shoot”) + scale_y_continuous(breaks = seq(from = 0, to = 325, by = 25)) + scale_x_date(date_breaks = “2 days”, date_labels = “%d-%m-%Y”) + # Exemple de format de date (%d-%m-%Y) theme(axis.text.x = element_text(angle = 45, hjust = 1), # Rotation des étiquettes des dates panel.grid.major.x = element_line(color = “lightgray”, size = 0.5), # Paramètres de la grille panel.grid.minor.x = element_blank()) # Supprimer les lignes de grille mineures

combined <- h1 + h2 + h3 & theme(legend.position = “top”) combined + plot_layout(guides = “collect”)

A. Exploration of the timeseries dataframe

Variables

Firsty, we extract the variables of the timeseries dataframe.

B. Exploration of the S_timeseries dataframe

Variables

## [1] "S_Height_cm"       "S_Height_pixel"    "S_Area_cmsquared" 
## [4] "S_Area_pixel"      "S_Perimeter_cm"    "S_Perimeter_pixel"
## [7] "S_Compactness"     "S_Width_cm"        "S_Width_pixel"

Time point object

1. endpoint

Raw data

## timePoint_endpoint contains data for experiment EPPN2020_NaPPI.
## 
## It contains 1 time points.
## First time point: 2020-07-05 
## Last time point: 2020-07-05 
## 
## No check genotypes are defined.
##   timeNumber  timePoint
## 1          1 2020-07-05

Count the number of observations per trait.

## [1] "How many observations for DW_shoot_g"
## 2020-07-05 
##        125 
## [1] "How many observations for FW_shoot_g"
## 2020-07-05 
##        125

Check the layout at the only timepoint.

##   timeNumber  timePoint
## 1          1 2020-07-05

Check the heatmap of the raw data at harvest.

After outlier detection

Count the number of observations per trait

## [1] "How many observations for DW_shoot_g"
## 2020-07-05 
##        125 
## [1] "How many observations for FW_shoot_g"
## 2020-07-05 
##        125

Check the heatmap of the data at harvest.

2. S_timeseries

## timePoint_S contains data for experiment EPPN2020_NaPPI.
## 
## It contains 13 time points.
## First time point: 2020-06-17 
## Last time point: 2020-07-05 
## 
## No check genotypes are defined.
##    timeNumber  timePoint
## 1           1 2020-06-17
## 2           2 2020-06-22
## 3           3 2020-06-23
## 4           4 2020-06-24
## 5           5 2020-06-25
## 6           6 2020-06-27
## 7           7 2020-06-28
## 8           8 2020-06-29
## 9           9 2020-06-30
## 10         10 2020-07-01
## 11         11 2020-07-02
## 12         12 2020-07-04
## 13         13 2020-07-05

Variables and number of observations

We choose the variables that we want to see. Count the number of observations per variable.

## [1] "How many observations for S_Height_cm"
## 2020-06-17 2020-06-22 2020-06-23 2020-06-24 2020-06-25 2020-06-27 2020-06-28 
##        125        125        125        125        125        125        125 
## 2020-06-29 2020-06-30 2020-07-01 2020-07-02 2020-07-04 2020-07-05 
##        125        125        125        125        125        125 
## [1] "How many observations for S_Area_cmsquared"
## 2020-06-17 2020-06-22 2020-06-23 2020-06-24 2020-06-25 2020-06-27 2020-06-28 
##        125        125        125        125        125        125        125 
## 2020-06-29 2020-06-30 2020-07-01 2020-07-02 2020-07-04 2020-07-05 
##        125        125        125        125        125        125 
## [1] "How many observations for S_Perimeter_cm"
## 2020-06-17 2020-06-22 2020-06-23 2020-06-24 2020-06-25 2020-06-27 2020-06-28 
##        125        125        125        125        125        125        125 
## 2020-06-29 2020-06-30 2020-07-01 2020-07-02 2020-07-04 2020-07-05 
##        125        125        125        125        125        125 
## [1] "How many observations for S_Compactness"
## 2020-06-17 2020-06-22 2020-06-23 2020-06-24 2020-06-25 2020-06-27 2020-06-28 
##        125        125        125        125        125        125        125 
## 2020-06-29 2020-06-30 2020-07-01 2020-07-02 2020-07-04 2020-07-05 
##        125        125        125        125        125        125 
## [1] "How many observations for S_Width_cm"
## 2020-06-17 2020-06-22 2020-06-23 2020-06-24 2020-06-25 2020-06-27 2020-06-28 
##        125        125        125        125        125        125        125 
## 2020-06-29 2020-06-30 2020-07-01 2020-07-02 2020-07-04 2020-07-05 
##        125        125        125        125        125        125

Genotypic layout

Check the genotypic layout at every timepoint.

##    timeNumber  timePoint
## 1           1 2020-06-17
## 2           2 2020-06-22
## 3           3 2020-06-23
## 4           4 2020-06-24
## 5           5 2020-06-25
## 6           6 2020-06-27
## 7           7 2020-06-28
## 8           8 2020-06-29
## 9           9 2020-06-30
## 10         10 2020-07-01
## 11         11 2020-07-02
## 12         12 2020-07-04
## 13         13 2020-07-05

Raw data visualisation

Heatmap of raw data

Check the heatmap of the raw data at all the time points

Time course of raw data

Check some time courses of raw data

Boxplots of raw data

Correlation plots of raw data

Outliers detection

For first trait

Using the SingleOut detect and single functions. We select a subset of plants to adjust the settings for the confIntSize and nnLocfit.

For all the traits

We can then run on all plants.

## [1] "S_Height_cm"
## timePoint
## 2020-06-17 2020-07-05 
##          7          1 
## [1] "S_Area_cmsquared"
## timePoint
## 2020-07-05 
##          1 
## No outlier for S_Perimeter_cm 
## [1] "S_Compactness"
## timePoint
## 2020-06-17 2020-06-23 
##          3          2 
## [1] "S_Width_cm"
## timePoint
## 2020-06-17 2020-07-05 
##          3          1

Data visualisation after outliers removal

Heatmap of data

Check the heatmap of the data with outliers detection at all the time points.

## Aucun objet Single_outliers trouvé pour le trait S_Perimeter_cm

Time course, boxplots and correlation plots of data

## No Single_outliers object found for trait S_Perimeter_cm

Fit a model

Fit a model for all time points with no extra fixed effects.

## 2020-06-17
## 2020-06-22
## 2020-06-23
## 2020-06-24
## 2020-06-25
## 2020-06-27
## 2020-06-28
## 2020-06-29
## 2020-06-30
## 2020-07-01
## 2020-07-02
## 2020-07-04
## 2020-07-05
## 2020-06-17
## 2020-06-22
## 2020-06-23
## 2020-06-24
## 2020-06-25
## 2020-06-27
## 2020-06-28
## 2020-06-29
## 2020-06-30
## 2020-07-01
## 2020-07-02
## 2020-07-04
## 2020-07-05
## 2020-06-17
## 2020-06-22
## 2020-06-23
## 2020-06-24
## 2020-06-25
## 2020-06-27
## 2020-06-28
## 2020-06-29
## 2020-06-30
## 2020-07-01
## 2020-07-02
## 2020-07-04
## 2020-07-05
## 2020-06-17
## 2020-06-22
## 2020-06-23
## 2020-06-24
## 2020-06-25
## 2020-06-27
## 2020-06-28
## 2020-06-29
## 2020-06-30
## 2020-07-01
## 2020-07-02
## 2020-07-04
## 2020-07-05
## 2020-06-17
## 2020-06-22
## 2020-06-23
## 2020-06-24
## 2020-06-25
## 2020-06-27
## 2020-06-28
## 2020-06-29
## 2020-06-30
## 2020-07-01
## 2020-07-02
## 2020-07-04
## 2020-07-05

Model visualisation

## Output at: C:/Users/elise/Documents/Mémoire/Template/NaPPI/NaPPI_Template/testtemplate/S_Height_cm_mod.gif

## Output at: C:/Users/elise/Documents/Mémoire/Template/NaPPI/NaPPI_Template/testtemplate/S_Area_cmsquared_mod.gif

## Output at: C:/Users/elise/Documents/Mémoire/Template/NaPPI/NaPPI_Template/testtemplate/S_Perimeter_cm_mod.gif

## Output at: C:/Users/elise/Documents/Mémoire/Template/NaPPI/NaPPI_Template/testtemplate/S_Compactness_mod.gif

## Output at: C:/Users/elise/Documents/Mémoire/Template/NaPPI/NaPPI_Template/testtemplate/S_Width_cm_mod.gif

Use the splines

## 2020-06-17
## 2020-06-22
## 2020-06-23
## 2020-06-24
## 2020-06-25
## 2020-06-27
## 2020-06-28
## 2020-06-29
## 2020-06-30
## 2020-07-01
## 2020-07-02
## 2020-07-04
## 2020-07-05

For a plant selection

Time series outliers

cutoff <- 1
thrCor <- c(0.9)[cutoff] # correlation threshold
thrPca <- c(30)[cutoff] # pca angle threshold
thrSlope <- c(0.7)[cutoff] # slope threshold

Series_test <- detectSerieOut(corrDat = Spatial_Corrected,
                           predDat = predDat,
                           coefDat = coefDat,
                           trait = paste0(trait_name, "_corr"),
                           thrCor = thrCor,
                           thrPca = thrPca,
                           thrSlope = thrSlope,
                           geno.decomp = "geno.decomp")
## Warning: The following genotypes have less than 3 plotIds and are skipped in the outlier detection:
## EPPN07_L.Line, EPPN10_L.Line
plot(Series_test, genotypes = levels(factor(Series_test$genotype)))

Spatial_Corrected_Out <- Spatial_Corrected

With the cleaned data